In commercial real estate, price indices are designed to show the current strength of the commercial real estate market across the United States.This project is an attempt to forecast the Commercial Real Estate Price Index. We had a a quarterly record of the SIOR CREI from January, 2004 to December, 2022. Created in 2014, the US CREI is one of the most regularly updated CRE price indexes on the market today. It was created by CRE Demographics, LLC, and examines eight economic drivers in order to determine relative market strength.
As a commercial investor, you may not be able to predict the market using a commercial real estate price index, but you can certainly get some valuable insight into where it may be headed. Like other types of investors, many of the smartest commercial real estate investors attempt to “buy low and sell high.” For example, if property appreciation is your goal, you may be hesitant to buy if commercial real estate prices have gone up for the last 3-4 years straight, as they could decline soon. However, individual property value is essential; and so are local market factors. This means that overall CRE prices are simply one piece of data that investors and developers can utilize to gain a balanced perspective before making a business decision.
library(readxl)
library(tseries)
library(forecast)
library(TSstudio)
#setwd("C:/Users/WIN8/Desktop/PGDS 2020/FINA")
cre <- read_excel("Praxis workshop_MEV.xlsx", skip = 22, n_max = 1)
cre <- unlist(cre[4:ncol(cre)], use.names = F)
cre <- ts(cre, start = c(2004, 1), frequency = 12)
ts_info(cre)
## The cre series is a ts object with 1 variable and 228 observations
## Frequency: 12
## Start time: 2004 1
## End time: 2022 12
train <- window(cre, start = c(2004, 1), end = c(2017, 12))
test.data <- window(cre, start = c(2018, 1), end = c(2018, 12))
We would like to check our model after fitting the model. To do that, we’ll split the data into train and test. As we are creating a time series model, we won’t be taking out test sample randomly. We have decided to pull out the data of the four quarters of 2018 for the test split. Our train sample consists of monthly data from 2004 to 2017.
In this section, we are going to check for stationarity and how to make it stationary. We cannot work on a time series if it is not stationary so this step is crucial for creating a time series model. Let us start with taking a look at the data.
Using the above plot, we can observe that the time series data definitely has a trend and therefore, isn’t stationary. Let us decompose the time series and check if we have any seasonality.
autoplot(decompose(train))
We observe some seasonality in the data. So we’ll take different differencing and perform Augmented Dicky-Fuller test to choose the right degree of differencing.
for (i in 1:5) {
differenced <- diff(train, differences = i)
test_result <- suppressWarnings(adf.test(differenced))
print(paste('Difference',i, ': ', test_result$p.value))
}
## [1] "Difference 1 : 0.022966502166445"
## [1] "Difference 2 : 0.01"
## [1] "Difference 3 : 0.01"
## [1] "Difference 4 : 0.01"
## [1] "Difference 5 : 0.01"
From the above result, we can conclude that differencing of degree 1 is enough to make our dataset stationary. We will use the differenced data set from here on to build our model.
Let us take a look at what our dataset looks like after differencing.
Now that we have made our data stationary, let us find out the lag to fit our model. We will need to look at the acf and pacf plot to figure this out.
acf(train.diff, lag.max = 20)
acf(train.diff, plot = FALSE, lag.max = 20)
##
## Autocorrelations of series 'train.diff', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
## 1.000 0.814 0.627 0.438 0.368 0.299 0.230 0.222 0.211 0.205 0.171
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## 0.143 0.111 0.149 0.186 0.221 0.183 0.140 0.101 0.072 0.047
pacf(train.diff, lag.max = 20)
pacf(train.diff, plot = FALSE, lag.max = 20)
##
## Partial autocorrelations of series 'train.diff', by lag
##
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167
## 0.814 -0.103 -0.123 0.231 -0.069 -0.085 0.235 -0.057 -0.044 0.059 -0.030
## 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## -0.054 0.267 -0.021 -0.039 -0.036 -0.029 -0.034 0.031 -0.024
The ACF and PACF plots of the first difference of the series indicate that an AR(1) process is appropriate to use on the differenced series since the ACF is tailing off and the PACF cuts on the first lag. Therefore, we will apply an ARIMA(1,1,0) model on the CREI.
crei <- arima(train, order = c(1,1,0))
#Summary
summary(crei)
##
## Call:
## arima(x = train, order = c(1, 1, 0))
##
## Coefficients:
## ar1
## 0.8240
## s.e. 0.0429
##
## sigma^2 estimated as 2.631: log likelihood = -318.32, aic = 640.63
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1373424 1.617354 0.9624374 0.07708947 0.464824 0.4224866
## ACF1
## Training set 0.07465841
The result will provide the estimate of element of AR(1) model.
Let us now remove the seasonality of the data by seasonal differencing.
train.diff.seasonal <-diff(train, differences = 12)
adf.test(train.diff.seasonal)
##
## Augmented Dickey-Fuller Test
##
## data: train.diff.seasonal
## Dickey-Fuller = -25.897, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
The ACF and PACF plots of the seasonaly differenced time series indicate that P = 3 and Q = 4 is appropriate to use on the differenced series. ACF and PACF plots do give us a general idea of the order to be used but it is not absolutely correct. With some experimentation, we found out that Q = 1 gives us a better AIC for this dataset so we have opted to go with it instead.
acf(train.diff.seasonal, lag.max = 20)
acf(train.diff.seasonal, plot = FALSE, lag.max = 20)
##
## Autocorrelations of series 'train.diff.seasonal', by lag
##
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
## 1.000 -0.933 0.759 -0.526 0.288 -0.086 -0.059 0.146 -0.188 0.209 -0.226
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## 0.244 -0.258 0.260 -0.247 0.218 -0.177 0.132 -0.093 0.071 -0.065
pacf(train.diff.seasonal, lag.max = 20)
pacf(train.diff.seasonal, plot = FALSE, lag.max = 20)
##
## Partial autocorrelations of series 'train.diff.seasonal', by lag
##
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333 0.9167
## -0.933 -0.874 -0.466 0.047 0.115 -0.016 0.072 -0.160 0.084 0.059 -0.082
## 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
## -0.035 0.111 -0.039 -0.050 -0.011 0.104 -0.074 -0.008 -0.013
final.model <- arima(train, order = c(1,1,0), seasonal = list(order = c(0,0,1), period = 12))
summary(final.model)
##
## Call:
## arima(x = train, order = c(1, 1, 0), seasonal = list(order = c(0, 0, 1), period = 12))
##
## Coefficients:
## ar1 sma1
## 0.8527 -0.2405
## s.e. 0.0410 0.0919
##
## sigma^2 estimated as 2.52: log likelihood = -315.12, aic = 636.25
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1495787 1.582904 0.9359214 0.07981572 0.4504877 0.4108467
## ACF1
## Training set 0.06547105
The procedure includes observing residual plot and its ACF & PACF diagram, and check Ljung-Box result. If ACF & PACF of the model residuals show no significant lags, the selected model is appropriate.
checkresiduals(final.model)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)(0,0,1)[12]
## Q* = 27.193, df = 22, p-value = 0.204
##
## Model df: 2. Total lags used: 24
Overall, the plot of the model’s residuals and Ljung-Box test indicate that the residuals are white noise.The ACF plot indicate that there are no lags. Ljung-Box test also provides a different way to double check the model. Basically,Ljung-Box is a test of autocorrelation in which it verifies whether the autocorrelations of a time series are different from 0.
The p-values is greater than 0.05, so we cannot reject the hypothesis that the autocorrelation is different from 0. Therefore, the selected model is an appropriate for this dataset.
forecasting and plotting for 12 months
plot_forecast(forecast_obj = forecast(final.model, h = 12))
Let us check the accuracy of our forecast.
accuracy(forecast(final.model, h =12), test.data)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1495787 1.582904 0.9359214 0.07981572 0.4504877 0.04439398
## Test set 3.8565369 5.712511 4.4226349 1.32647404 1.5276307 0.20978081
## ACF1 Theil's U
## Training set 0.06547105 NA
## Test set 0.79489494 4.97962
CREI_2004_to_2018 <- window(cre, start = c(2004, 1), end = c(2018, 12))
test_forecast(CREI_2004_to_2018, forecast.obj =
forecast(final.model, h = 12), test = test.data)
We got a MAPE of 1.528 and an RMSE of 5.713.
Let us compare our model with a simple STL model.
stl.model <- stl(train, s.window = 'periodic')
accuracy(forecast(stl.model, h = 12), test.data)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1256268 1.569209 0.9254087 0.06974381 0.4514411 0.04389532
## Test set 5.0596433 5.353604 5.0596433 1.75919320 1.7591932 0.23999632
## ACF1 Theil's U
## Training set 0.05931435 NA
## Test set 0.57585507 4.682191
test_forecast(CREI_2004_to_2018, forecast.obj = forecast(stl.model, h = 12), test = test.data)
Here we could see Mean absolute percentage error MAPE between train and test to be at 0.45% and 1.76% error. When we compare this to the accuracy observed in SARIMA, we can conclude that ARIMA model gives us better result.